Skip to content

Create Porting/MANIFEST.dev as a complement to MANIFEST #19513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: blead
Choose a base branch
from

Conversation

demerphq
Copy link
Collaborator

@demerphq demerphq commented Mar 10, 2022

This file is intended to list all the files in the repo which are not
listed in the main MANIFEST file, and which are used only for
development purposes, especially those files which are only useful when
working in a git checkout of the main perl git repository.

The files it contains will NOT be added to the production tarball
release. The file has the exact same format as the main MANIFEST:
"file\t+description" or "file".

Q. Why didn't I call this Porting/MANIFEST as mentioned in the
discussion thread that lead to this patch?

A. The main reason was that Porting/README.pod includes a list of files
in Porting with descriptions and explanations for what the files do
or how they are used. In several places the file refers to
"MANIFEST", which lead to ambiguity that would have had to be
resolved by changing all the entries to refer to "Porting/whatever"
instead. It was much simpler to give the new file an extension, and I
thought that '.dev' suggests it is for "development" purposes.

Q. Why isn't this using MANIFEST.skip style functionality?

A1. Various parts of our build and test process expect to read the
MANIFEST file and then do things based on the entries contained
within. Eg, run tests, or extract data, or compare the file list to
content in another file. Those parts of our build process would
break if we used a skip style list of regexen. So it would be more
work to teach them to deal with such a file, assuming it was
actually doable - given the additional work I have not considered it
deeply. On the other hand teaching that logic to simply read two
files was and is easy.

A2. I think each file we have in the repo should have a description.
This patch currently doesn't provide a description for each, but it
does for many, especially those migrated from MANIFEST.

A3. I think that MANIFEST.skip style files of exclusion regexens and
globs are error prone and easy to mess up, for instance by excluding
far more than you had intended to. They can also be annoying to get
right, obviously not impossible, but sometimes annoying. Explicitly
listing everything is easy in every way, especially to mechanize.

A4. I would like to be able to move verbatim entries from our existing
MANIFEST into the new Porting/MANIFEST.dev, description and all.
MANIFEST.skip style files do not support descriptions except as
comments as far as I recall. That would have meant munging the data
from MANIFEST during the move process which would be annoying.

A5. I would like to be able to reuse our sorting logic to keep the files
nicely sorted in a way where the file is somewhat readable. A list
of skip files would be less amenable to doing so.

Q. There is a lot of duplicated logic related to manifests, should we
refactor it out into a module or some resuable tool set?

A. YES! We already have Porting/manifest_lib.pm, but it currently does
not declare a package, and it only contains one function. Instead of
adding yet more code that depends on requiring a file and having it
inject subs into package main I decided that doing the refactoring
could wait for a separate commit or PR. But I definitely think we
should refactor as much of this logic as possible.

Q. Some of the test files were fairly significantly changed, are you
sure you didn't break or drop any of the tests?

A. I am reasonably confident I did not. Secondary review appreciated.
Some of the touched files are quite old and obviously "quick hack"
scripts. By rewriting them quite a bit I was able to simplify and
perform some of the tests in different ways or parts of the script.
As far as I know I didn't drop any.

Q. Why didn't you use newer features in the rewrite?

A. I am a bit conservative in my taste, and I like build tools to be
able to run on older perls, and for things like this I prefer to
stick with what I know well. Patches welcome.

Q. Why didn't you move more of the stuff we shouldn't bundle with
our releases?

A. I figured someone like Nicolas R. (who helped motivate this patch)
would feel left out if I didn't leave him anything to do. :-)

@demerphq demerphq added the Work In Progress This PR is in progress and is not meant to be merged yet. label Mar 10, 2022
@Leont
Copy link
Contributor

Leont commented Mar 10, 2022

I don't think this is the right solution. We should be reusing out existing MANIFEST.SKIP infrastructure and conventions instead of reinventing a wheel.

@Leont
Copy link
Contributor

Leont commented Mar 10, 2022

I don't think this is the right solution. We should be reusing out existing MANIFEST.SKIP infrastructure and conventions instead of reinventing a wheel.

Essentially what we should do is just

use ExtUtils::Manifest;
my $skip = maniskip;
chomp(my @git-files = `git ls-files`);
my @files = grep !$skip->($_), @git-files;

@demerphq
Copy link
Collaborator Author

demerphq commented Mar 10, 2022 via email

@Leont
Copy link
Contributor

Leont commented Mar 10, 2022

I personally would prefer a proper
MANIFEST file for a number of reasons including a proper description
of each file

I do agree with that

@demerphq demerphq force-pushed the yves/manifest_dev branch from d5b785f to c155bcc Compare March 10, 2022 17:05
@demerphq demerphq added build tooling and removed Work In Progress This PR is in progress and is not meant to be merged yet. labels Mar 10, 2022
@demerphq
Copy link
Collaborator Author

Ok, I have removed the "Work In Progress" flag, and I am ready to debate this now. Thank you for your patience. :-)

Please review the PR message (same as the commit message), it explains why I don't think that this should be a MANIFEST.skip functionality. The strongest argument IMO is A1, which I will paraphrase here:

Quite a bit of our build tooling expects to be able to read the MANIFEST file and then do stuff based on the contents. MANIFEST.skip is designed to exclude things from the creation or updates of a MANIFEST, usually via "make manifest" in a normal CPAN dist (which our Makefile does not support!). But we want to use these lists of files to do further processing, so a list of regexes and file globs doesn't really cut it. Especially in terms of ease of conversion to use the new files having two manifest like files is much easier to deal with, all the places we load the current MANIFEST can be changed to load both and operate on the union. Doing the same thing with skip file would IMO take a lot more effort for minimal benefit. In other words while it might seem like using maniskip() type functionality is suited to the task, IMO it really isn't.

There are other IMO good reasons this should not "just" reuse the maniskip functionality, and I have listed them all in the commit message, but the fact that we want to operate and process the list of files is the strongest.

@Leont
Copy link
Contributor

Leont commented Mar 10, 2022

This does a bunch of tooling changes, a bunch of changes to what files are shipped in a tarball, a few additions to the descriptions of files in the MANIFEST and updating a piece of documentation. Those should at the very least be split into 4 separate commits (and possibly the tooling changes could be split further)

@karenetheridge
Copy link
Member

Various parts of our build and test process expect to read the MANIFEST file and then do things based on the entries contained within.

What are these things, that might be adversely affected by listing some Porting/* files in MANIFEST.SKIP?

I think that MANIFEST.skip style files of exclusion regexens and globs are error prone and easy to mess up, for instance by excluding far more than you had intended to

MANIFEST.SKIP can also handle literal entries, without wildcards, if it is more convenient to list every file out longhand.

@Leont
Copy link
Contributor

Leont commented Mar 10, 2022

What are these things, that might be adversely affected by listing some Porting/* files in MANIFEST.SKIP?

As far as I can tell that's only t/porting/exec-bit.t, except it's wrong. Porting/exec-bit.txt is what we use to set +x bits when building a release, so files that aren't part of a release shouldn't be listed in it. So it doesn't need the skip-list after all.

@Leont
Copy link
Contributor

Leont commented Mar 10, 2022

Q. Why isn't this using MANIFEST.skip style functionality?

A1. Various parts of our build and test process expect to read the MANIFEST file and then do things based on the entries contained within. Eg, run tests, or extract data, or compare the file list to content in another file. Those parts of our build process would break if we used a skip style list of regexen. So it would be more work to teach them to deal with such a file, assuming it was actually doable - given the additional work I have not considered it deeply. On the other hand teaching that logic to simply read two files was and is easy.

That seems to assume that using MANIFEST.SKIP means not using MANIFEST. I don't think anyone has suggested doing away with the latter. We would use MANIFEST.SKIP for two reasons: regenerating the MANIFEST, and testing it for correctness. Anything else would use MANIFEST as it does now.

A2. I think each file we have in the repo should have a description. This patch currently doesn't provide a description for each, but it does for many, especially those migrated from MANIFEST.

Descriptions can be useful, but I really don't see the value of «.gitignore rules for the Digest-SHA cpan distribution» for dozens of cpan distributions.

A3. I think that MANIFEST.skip style files of exclusion regexens and globs are error prone and easy to mess up, for instance by excluding far more than you had intended to. They can also be annoying to get right, obviously not impossible, but sometimes annoying. Explicitly listing everything is easy in every way, especially to mechanize.

We have tests. We are checking for correctness. I don't see how this would suddenly become error-prone. Again, this seems to assume that there wouldn't be a MANIFEST file, which is not how MANIFEST.SKIP is typically used.

A4. I would like to be able to move verbatim entries from our existing MANIFEST into the new Porting/MANIFEST.dev, description and all. MANIFEST.skip style files do not support descriptions except as comments as far as I recall. That would have meant munging the data from MANIFEST during the move process which would be annoying.

As far as I know, MANIFEST.SKIP does support descriptions (though it isn't well-documented).

Add a new convenience option to manisort for a common case

  manisort --fix=MANIFEST_FILE

is the same as

  manisort --nocheck --output=MANIFEST_FILE MANIFEST_FILE

Eg, it runs quietly, returns an exit code of 0 on success and
rewrites the file in correctly sorted order while removing
true dupes.
The --fix option returns true if it worked, and so the use of
the true command is not required.
Even though these pods are dead we should keep their descriptions
for the manifest.
@demerphq demerphq force-pushed the yves/manifest_dev branch from c155bcc to 21a660d Compare March 11, 2022 04:51
@demerphq
Copy link
Collaborator Author

demerphq commented Mar 11, 2022

t/porting/readme.t and t/porting/readme.t. Also some of our test logic uses MANIFEST to get the list of tests to run. So far we havent moved any t files into Porting/MANIFEST.dev but it seems reasonable that we will. I expect @atoomic will be pushing some patches on top of this branch to do things like that.

You are welcome to migrate this all to use MANIFEST.skip instead, I don't think its worth the effort.

@demerphq
Copy link
Collaborator Author

I found another 7 places where we use the manifest file to drive testing or other processing. I really don't get how you propose to make this work with MANIFEST and MANIFEST.SKIP. The idea of MANIFEST.SKIP is to designate files that should NOT be included in the manifest. So for the places where we want to have a way to get a list of the things we would have to abuse the MANIFEST.SKIP functionality to do so, or rewrite code to walk the disk or something, which brings additional side-issues with it. So likely we would have to read two file formats. The work I just did was mostly mechanical, to make the code read two files and not one.

Nothing i said implied that using MANIFEST.SKIP would NOT involve using MANIFEST, it implies that MANIFEST would be missing files, and thus in the places where we want to read a manifest of ALL the files, including those not part of the final tarball, that we would have to read that list from two totally differently formatted files. Your way would involve more complex processing, and likely more complex changes. I didn't want to get into analyzing whether choices made by previous devs in those files was sensible, for instance your point about the execute-bit test, somebody decided it was a good idea to have files in Porting check that the files were listed there. I don't know if that was a good decision or not, ill leave that to someone else.

Regarding your comment about descriptions for .gitignore, sure, that doesn't add that much value, I just did that for completeness and to establish the trend that things should be described. And in one of the comments I included some information about how to get documentation on .gitignore, which I think is useful, likewise on .gitattributes, and I described .git_patch, and the other files in the list do have useful descriptions. So I don't really think your point about .gitignore is very helpful to the discussion. You seem to think this discussion is about having a way to specify the things we ignore in certain places. IMO that is not the point, the point to me is how do we have our cake, a fully documented MANIFEST like database of our files, and still be able to distinguish between files that should be in the tarball and final release and those that shouldn't, and how and where do we authoritatively document and list the files. The fact that previously we listed almost everything in MANIFEST and then in a few places where tested that MANIFEST listed everything we filtered out a few special cases isn't in my eyes representative of the core problem I want to solve here, if it was then I admit MANIFEST.SKIP would be the way to go. In fact I could easily imagine my plan coupling with the use of a MANIFEST.SKIP file as well, for the few places where we want to look at the union of all our files and then exclude some files based on regex. There are two such examples in the code I just pushed. But that wouldn't make MANIFEST.dev go away, it would add a third data file. Personally i don't think there is evidence /right now/ that we should use such a complex a mechanism, the two cases are very specific. One of them might even totally redundant.

Maybe your proposal would "perfect", but IMO this is a good case of perfect is the enemy of good enough, this is good enough, it works, and it passes test, it is conceptually simple, and it scratches the itch that was expressed. Also, my plan has one big advantage over your plane, mine is already done. Maybe you want to put together a patch that does it your way to prove your point, but I don't think your plan is as good an idea as you do so I won't be doing it that way myself.

The comment says the highest exit code we should return is 124, but the
code will return 125 if there are 125 problems, if it is higher than 125
it will return 124. Which doesn't make sense.

This patch changes the logic to do what the comment says, return 124 if
there are 124 or more problems, otherwise it returns the count of
problems.
…nfra

This file is intended to list all the files in the repo which are not
listed in the main MANIFEST file, and which are used only for
development purposes, especially those files which are only useful when
working in a git checkout of the main perl git repository.

The files it contains will NOT be added to the production tarball
release. The file has the exact same format as the main MANIFEST:
"file\t+description" or "file".

Q. Why didn't I call this Porting/MANIFEST as mentioned in the
   discussion thread that lead to this patch?

A. The main reason was that Porting/README.pod includes a list of files
   in Porting with descriptions and explanations for what the files do
   or how they are used. In several places the file refers to
   "MANIFEST", which lead to ambiguity that would have had to be
   resolved by changing all the entries to refer to "Porting/whatever"
   instead. It was much simpler to give the new file an extension, and I
   thought that '.dev' suggests it is for "development" purposes.

Q. Why isn't this using MANIFEST.skip style functionality?

A1. Various parts of our build and test process expect to read the
    MANIFEST file and then do things based on the entries contained
    within. Eg, run tests, or extract data, or compare the file list to
    content in another file. Those parts of our build process would
    break if we used a skip style list of regexen. So it would be more
    work to teach them to deal with such a file, assuming it was
    actually doable - given the additional work I have not considered it
    deeply. On the other hand teaching that logic to simply read two
    files was and is easy.

A2. I think each file we have in the repo should have a description.
    This patch currently doesn't provide a description for each, but it
    does for many, especially those migrated from MANIFEST.

A3. I think that MANIFEST.skip style files of exclusion regexens and
    globs are error prone and easy to mess up, for instance by excluding
    far more than you had intended to. They can also be annoying to get
    right, obviously not impossible, but sometimes annoying. Explicitly
    listing everything is easy in every way, especially to mechanize.

A4. I would like to be able to move verbatim entries from our existing
    MANIFEST into the new Porting/MANIFEST.dev, description and all.
    MANIFEST.skip style files do not support descriptions except as
    comments as far as I recall. That would have meant munging the data
    from MANIFEST during the move process which would be annoying.

A5. I would like to be able to reuse our sorting logic to keep the files
    nicely sorted in a way where the file is somewhat readable. A list
    of skip files would be less amenable to doing so.

Q. There is a lot of duplicated logic related to testing manifests,
   should we refactor it out into a module or some resuable tool set?

A. YES! We already have Porting/manifest_lib.pm, but it currently does
   not declare a package, and it only contains one function. Instead of
   adding yet more code that depends on requiring a file and having it
   inject subs into package main I decided that doing the refactoring
   could wait for a separate commit or PR. But I definitely think we
   should refactor as much of this logic as possible.

Q. Some of the test files were fairly significantly changed, are you
   sure you didn't break or drop any of the tests?

A. I am reasonably confident I did not. Secondary review appreciated.
   Some of the touched files are quite old and obviously "quick hack"
   scripts. By rewriting them quite a bit I was able to simplify and
   perform some of the tests in different ways or parts of the script.
   As far as I know I didn't drop any.

Q. Why didn't you use newer features in the rewrite?

A. I am a bit conservative in my taste, and I like build tools to be
   able to run on older perls, and for things like this I prefer to
   stick with what I know well. Patches welcome.

Q. Why didn't you move more of the stuff we shouldn't bundle with
   our releases?

A. I figured someone like Nicolas R. (who helped motivate this patch)
   would feel left out if I didn't leave him anything to do. :-)
Patch best viewed with -w
@demerphq demerphq force-pushed the yves/manifest_dev branch from 57bbb9b to ff9e589 Compare March 11, 2022 08:25
@Leont
Copy link
Contributor

Leont commented Mar 11, 2022

Also, my plan has one big advantage over your plane, mine is already done.

I can't overstate how annoyed I am at this statement, even if I was expecting it.

Not just because ignores the obvious and entirely valid third option of leaving things the way they are, but primarily because it is a rather hostile way of cooperation. You start with «But for now I'd rather not debate it with you. When the PR goes out of "work in progress" I will be happy to discuss more.», and then the next day you come up with this "I already wrote it so we might as well merge it".

Frankly, if you can't even pretend to be interested in actually listening to what other people are saying I'm not going to waste my time and energy on this discussion and would rather suggest for this ticket to be closed.

Maybe you want to put together a patch that does it your way to prove your point, but I don't think your plan is as good an idea as you do so I won't be doing it that way myself.

Sure: #19523

@Leont
Copy link
Contributor

Leont commented Mar 11, 2022

Also, if you're going to make fundamental changes to how tarballs are built, you probably want to double check if that tarball works. Right now it fails to build a tarball, but even if it didn't I am expecting it to fail some of those porting tests because of missing files.

@demerphq
Copy link
Collaborator Author

demerphq commented Mar 12, 2022 via email

@demerphq
Copy link
Collaborator Author

demerphq commented Mar 12, 2022 via email

atoomic added a commit that referenced this pull request Mar 17, 2022
This change is merging the two ideas from #19523 and #19513
by relying on a traditional MANIFEST.SKIP file.

Case #19523 simplifies the way we can easily exclude common noise
Case #19513 provides a mechanism to avoid shipping not necessary
files as part of the tarball

This is adding a new 'Porting/Manifest.pm' file which provides helpers
to list files from MANIFEST taking into account the 'MANIFEST.SKIP'
using ExtUtils::Manifest.
@atoomic
Copy link
Member

atoomic commented Mar 17, 2022

Note: #19542 provides one suggestion to merge both ideas from #19513 and #19523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants